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Abstract 


We  demonstrate  how  many  basic  statistical  inference  problems  (including 
the  non-parametric  one  sample  and  multi-sample  univariate  and  multivariate 
inference  problems  as  well  as  time  series  problems)  can  be  formulated  as  a 
hypothesis  that  a suitable  distribution  function  D(u)  , 0 s u ^ 1 satisfies 

D(u)  = u , 0 s u 5 1 . 

From  the  data  one  can  construct  a raw  estimator  D(u)  of  D(u)  , which 
has  the  property  that  asymptotically  (as  the  sample  size  tends  to  “),  under  the  null 
hypothesis  that  U(u)  = u , [D(u)  -u],  O^usl.isa  Brownian  bridge 

stochastic  process,  A conventional  statistical  approach  would  be:  test  the 
hypothesis  D(u)  = u by  examining  the  significance  of  the  deviation  from 
zero  of  various  functionals  of  D(u)  - o . 

The  time  scries  theoretic  approach  is  to  consider  the  density 
d(u)  = D^(u)  , 0 s u s 1 , and  the  Fourier  Stieltjes  transform 


K 
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cp(v)  = J c^^^^clD(u)  , V = 0,  ± 1,... 

0 

and  to  estimate  them.  Raw  estimators  are  given  by  d(u)  - D^(u)  , and 

cp(v)  = J e^^^'^dD(u)  , V = 0,  ± 1,... 

0 

A 

Usually  d(u)  is  a very  wiggly  curve;  one  then  seeks  a smooth  curve  d(t) 
which  is  a good  (or  "best")  estimator  of  d(t)  . 

To  test  whether  D(u)  =u,  OSuSl,  one  could  test  equivalently 
whether  cp(v)  = 0 for  -v  / 0 (for  example,  by  plotting  |cp(v)j^  as  a 
function  of  v = 1,2,..,  and  determining  if  any  of  them  are  significantly 
different  from  zero)  or  whether  d(u)  = 1 , 0 S u < 1 (for  example,  by 

determining  if  the  divergence  of  the  smoothed  density 

A = J {d(u)  - 1}  log  d(u)  du 
0 

is  significantly  different  from  zero). 

A 

A method  of  estimating  d(t)  without  making  any  prior  assumptions  about 
its  behavior  can  be  obtained  using  a time  series  prediction  theoretic  auto- 
regressive approach.  The  "time  series  identification"  problem  is  to  determine 
if  there  exists  a difference  equation  of  suitable  order  m which  the  sequence 
cp(v)  satisfies: 


cp(v)  + a^  cp(v  - 1)  + ...  + a^cp(v  - m)  = 0 , v = 1,2 

L 


r 


3 


1 


where  a ^ 0 , Then  to  test  whether  D(u)  = u , 0 ^ u < 1 , one  could  test 

m 

whether  m = 0 . 1 call  the  problem  of  estimating  m the  problem  of  order 

determination  of  an  approximating  autoregressive  scheme,  I propose  a function 
of  m (called  CAT  (m)  , for  criterion  autoregressive  transfer  function) 
which  can  be  computed  from  the  data  and  is  used  to  estimate  the  best  order  m 
as  follows;  take  m to  be  the  value  m at  which  CAT  (m)  achieves  its 

A 

minimum  value.  When  m = 0 we  could  accept  the  hypothesis  that  D(u)  = u 

A ^ 

or  equivalently  d(u)  = 1 ; when  m > 0 the  value  of  m is  used  to  form  the 

A 

'bptimum"  smooth  estimator  d(u)  of  d(t)  . 

Wlien  testing  the  fit  of  a model  it  seems  desirable  to  use  a test  which 
indicates  how  to  fix  the  model  when  it  is  found  not  to  fit.  To  adapt  an 
aphorism,  such  a model-testing  procedure  is  said  to  have  "the  seeds  of  its 
own  construction  (rather  than  only  destruction)." 

Preface 

The  typical  problem  facing  the  applied  statistician  (the  applied  statistics 
problem?)  has  been  described  [Easterling  (1976)]  as  follows:  "given  some  data, 
including  information  about  how  the  data  were  obtained,  what  probability  model(s), 
including  parameter  values,  can  be  found  which  adequately  explain,  or  describe, 
the  data?" 

I would  call  the  foregoing  a statistical  science  problem,  and  would 
describe  it  succlntly  to  be:  "model  probabilities  from  data,"  A routine 
applied  statistics  problem  could  be  formulated:  "infer  parameters  of  proba- 
bility laws  from  data."  Statisticians  might  not  disagree  that  the  aim  of 


statistics  should  be  to  model  probabilities  by  identifying  (rather  than 
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assuming)  their  probability  laws  but  they  might  doubt  whether  such  an  aim 
can  be  realized  in  practice,  especially  with  small  samples. 

The  aim  of  this  paper  is  to  propose  an  approach  to  non-parametric 
statistical  continuous  data  science  which  seems  to  be  consistent  with  the 
conventional  theories  and  methods  of  non-parametric  inference  but  seems  to 
point  the  way  to  universally  applicable  procedures  (for  continuous  data)  which 
are  asymptotically  as  efficient  as  the  best  conventional  goodness  of  fit  and 
parameter  estimation  procedures  available  for  each  particular  problem.  We 
have  programmed  the  methods  described  and  found  them  successful  in  test  cases. 
However,  in  the  space  available  to  this  paper  we  are  only  able  to  discuss  (with- 
out proofs  or  examples)  "Chapter  1"  of  our  work  which  outlines  the  "ideas"  : 
how  the  basic  general  applied  problems  of  statistical  inference  can  be  formu- 
lated as  problems  of  estimation  of  distribution  functions  on  the  unit  interval 
(or  the  unit  hyper-cube) , bow  such  problems  are  more  fruitfully  treated  as 
density  estimation  problems,  and  how  to  solve  density  estimation  problems 
one  can  use  the  method  which  is  the  essence  of  the  highly  successful  maximum 
likelihood  method  of  parameter  estimation:  using  a suitable  information- 
theoretic  divergence  distance  between  densities,  find  the  "smooth"  density 
which  is  closest  to  a "raw"  estimator  of  the  density. 


L 


Chapter  1 


DENSITY  ESTIMATION  FORMULATION  OF  BASIC  STATISTICAL 
INFERENCE  PROBLEMS 


The  aim  of  this  chapter  is  to  introduce  a single  canonical  problem  to 
which  one  can  transform  many  basic  statistical  inference  and  statistical 
data  analysis  problems.  This  canonical  problem  is  most  simply  described  as 
the  problem  of  testing  for  white  noise  via  density  estimation  or  smoothing. 
We  first  state  some  of  the  inference  problems  which  we  seek  to  unify. 

One-sample  (univariate)  inference  problems.  Let  Xj^,...,X^  be  i.i.d 
(independent  identically  distributed)  random  variables  with  common  a.c. 
(absolutely  continuous)  d.f.  (distribution  function)  F(x)  and  probability 
density  function  f(x)  . One  seeks  to  efficiently: 

(i)  estimate  f(x)  non-parametrically  (v-jithout  making  any  prior 
assumption  about  its  functional  form) 

(ii)  test  for  a specified  probability  density  whether  there 

exists  constants  pi  and  g such  that 


° o 'o(^)  > fO') 


(iii)  estimate  the  parameters  pi  and  a (called  location  and  scale 
parameters) . 

Two-sample  (univariate)  inference  problems.  Let  Xj^,...,X^  be  i.i.d 

with  common  a.c.  d.f.  F(x)  and  let  Y, , . . . ,Y  be  i.i.d.  with  common  a.c. 

1 n 

d.f.  G(x)  . One  seeks  to  efficiently: 


(i)  test  whether  there  exists  constants  (j  and  a such  that 


C(x) 


(ii)  estimate  |a  and  a . 

One-sample  multivariate  inference  problems . Le t 


be  a random  vector  with  absolutely  continuous  multivariate  distribution 
function  F(Xj^ , . . . ,x^)  and  density  f(x^,...,x^)  ; let  , • • • >2^  be  a 
random  sample.  One  seeks  to  efficiently: 

(i)  test  whether  the  components  are  independent  random 

variables, 

(ii)  estimate  the  multivariate  density  f , 

(iii)  estimate  the  regression  function 


*Vl>  = 


^1’- 


.X 


d-1 


In  addition,  there  are  multi-sample  univariate  inference  problems  and 
multi-sample  multivariate  inference  problems  concerned  with  the  equality  of 
many  distributions;  however,  they  are  nDt  discussed  in  this  paper. 

A CANONICAL  PROBLEM  (OF  DENSITY  ESTIMATION  AND  TESTING  FOR  CONSTANT 
DENSITY) : One  seeks  to  form,  from  "raw"  estimators  D(u)  , d(u)  , cp(v)  , 

A /V  A 

"optimal"  estimators  D(u)  , d(u)  , cp(v)  of  unknown  functions  D(u)  , d(u) 
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CO(v)  where  (i)  D(u)  , 0 s u s 1 , ig  an  absolutely  continuous  distribution 
function  on  the  unit  interval  satisfying  D(0)  = 0 , D(l)  = 1 ; 

(ii)  d(u)  = D (u)  is  its  density  function  satisfying  log  d(u)  and 
d ^(u)  arc  integrablc  functions  on  0 s u <C  1 ; (iii)  cp(v)  is  the  Fourier- 
Stieltjes  transform 

, . 2rriuv  , , . 

C?(v)  = J e du(u) 

0 

cx> 

satisfying  conditions  such  as  S lcp(v)|  <“  and  more  generally  for  some 

CD  yjzz  -OD 

r>0  Tj  jvj  I ^ . 

-CO 

One  often  defines  D(u)  , d(u)  , Cp(v)  so  that  a "null"  hypothesis  is 
equivalent  to  "white  noise"  in  tiic  sense  tiiat  tlie  null  hypothesis  is  equiva- 
lent to  the  following  three  equivalent  conditions: 

D(u)  = u , 0 < u s;  1 ; 

d(u)  = 1 , 0 ^ u S 1 ; 

p(v)  = 0 for  V / 0 

Tiic  raw  estimators  arc  usually  obtained  in  practice  by  forming  first 
cither  D(u)  or  cp(v)  . Then  the  otlier  is  formed  to  satisfy 

«P(v)  = J dD(u) 

0 


A CANONICAL  SOLUTION  (OF  DENSITY  ESTIMATION  AND  TESTING  FOR  A CONSTANT 
DENSITY):  Often  from  the  observed  data  one  can  form  a number  N of  values 
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dj  where  j = 0,1,..., N-  1 which  reprc.scnt  the  jumps  at  the  points  j/N  in 
the  unit  interval  0 5?  u ^ 1 of  a raw  distribution  function  D(u)  , 0 ^ u < 1 
Tl»e  Fourier  transform  tp(v)  can  then  be  found  by 

N-1  j 

) = S d . exp  (2TTiv") 
j=0  J 

Based  on  Cp(*)  one  computes  a criterion  (called  CAT)  which  determines  smooth 

/N  /\ 

estimators  d(u)  , D(u)  , Cp(v)  . 

Conventional  statistical  methods  test  the  null  hypothesis  : r)(u)  = u 

by  examining  the  deviations  from  zero  of  D(u)  - u or  Cp(v)  . We  accept 

A A 

if  d(u)  =1  or  if  d(t)  is  not  significantly  different  from  zero  using  the 
divergence 

A = J*  (d(u)  - l]  log  d(u)  du  ; 

0 

A 

otherwise  d(t)  provides  an  estimator  of  d(t)  . 

The  aim  of  this  paper  is  to  show  hov>;  to  formulate  diverse  statistical 
questions  so  that  their  answer  is  provided  by  tlie  foregoing  "solution." 

New  parameter  estimation  criteria  (which  generate  old  familiar  estima- 
tors in  cases  where  they  should)  can  be  formulated  using  the  above  structure. 

In  parametric  inference  one  assumes  a family  of  possible  probability  laws 
specified  by  probability  density  functions  f(x,0)  indexed  by  a parameter 
0 ; to  each  0 one  can  determine  a corresponding  density  d (u)  , 0 s u s 1 , 

where  the  subscript  0 Indicates  that  it  is  a function  of  the  parameter  0 . 
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Define  the  (raw)  information  divergence  [compare  Kullback  (1958)] 
1 

J(6)  = r - log  d (u)  dD(u) 

‘'n  0 


The  proposed  estimator  of  0 (called  a minimum  divergence  estimator)  is  the 

A 

value  9 at  which  J(0)  achieves  its  minimum  value.  Maximum  likelihood 

/s 

estimators  0 of  a parameter  0 from  a random  sample  can  be  defined  as  the 
values  minimizing  a criterion  of  similar  form,  namely 

“ . n 

L(0)  = J -log  f(.x,0)  dF  (x)  = — £ log  f(X.,0)  , 

" i=l  ^ 


where  is  the  empirical  distribution  function.  It  appears  plausible 

that  a theory  of  minimum  divergence  estimators  can  be  developed  which  would 

parallel  the  theory  of  maximum  likelihood  estimators  (including  robustness 

considerations,  which  correspond  to  integrating  log  d (u)  over  a sub-interval 

0 

Gsu<;  1-e  ). 


Another  criterion  useful  for  forming  parametric  estimators  from  densities 
defined  over  the  unit  interval  is:  choose  0 to  minimize 


When  applied  to  finite  parametric  normal  stationary  time  series  models,  this 
criterion  generates  asymptotically  efficient  estimators. 

When  criteria  yield  equivalent  results,  we  should  suspect  that  they  are 
calculating  essentially  the  same  thing;  I believe  one  can  show  this  to  be  the 
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case  here  in  the  sense  that  log  likelihood  of  the  "sufficient  statistics"  is 
asymptotically  (and  up  to  a constant  multiplier)  equal  to  J(0)  in  the  two 
sample  and  multivariate  cases,  and  equal  to  H(0)  in  the  one  univariate 
sample  and  univariate  time  series  cases. 


Hie  non-parametric  estimators  d(u)  which  we  propose  for  d(u) 

are  called  autoregressive  estimators;  they  are  approximators  to  d(u) 

expressed  in  terms  of  a parair'^tric  family  of  densities  d (u)  of  the  form 

0 


de(u) 


2Triu 


, ^ 2mum 
+ a e 
m 


-2 


2 

for  parameters  m,a  , a, , . . . ,a  to  be  estimated.  Autoregressive  estimators 
* m l m 

are  easily  evaluated  at  all  u in  0 <:  u ^ 1 , and  easily  provide  estimators 
of  derivatives  and  integrals  of  the  density  d(u)  . 


1 . One  Sample  Statistical  Inference 

To  identify  the  continuous  distribution  function  F(x)  of  a random 
sample  one  should  form  first  the  EOF  (empirical  distribution 

function) 

1 " 

F (X)  = - E e(x  - X ) , - “ < X < » , 

n'  n . . J 


where 
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e(x)  = 1 

if 

X ^ 0 

= 0 

if 

X < 0 

In  other  words,  frac.tion  .of  observations  less  than  or  equal 

to  X . The  inverse  distribution  function  F ^(u)  (also  called  the  quantile 
function,  in  which  case  it  is  denoted  Q(u))  of  F(x)  is  defined  by 


Q(u)  = f'^(u) 


inf  {x  ; F(x)  a u}  , 0 S u ^ 1 


The  quantile  function  has  the  basic  property  FQ(u)  = u . The  EQF  (empirical 
quantile  function)  is  defined  by 


Q„<u) 


F-hu) 


inf  [x  : ^ u}  , 0 ^ u ^ 1 


We  show  that  -t  provides  a powerful  approach  to  test  the  hypothesis 
Hq  : F(x)  = for  some  real  hi  and  o > 0 where  Fq(*)  is  a 

specified  distribution  function  and  (j  and  a are  unknown  parameters 
(ultimately  to  be  estimated).  In  terms  of  quantile  functions  one  can  express 
Hq  as  follows: 


Hq  : Q(u)  = (I  + a Qq(u)  for  some  real  |a  and  a > 0 


To  prove  this  formula  for  Q(u) 


write  X = Q(u)  iff  F(x)  = u iff  F^^— = u 
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llie  existence  of  the  derivative  f(x)  of  F(x)  implies  the  existence 
of  the  derivative,  denoted  q(u)  , of  Q(u)  . Further  f(q(u)^  = u implies 

f(Q(u))  q(u)  = 1 


Wc  call  q(u)  the  quantile-density  function  and  introduce  the  density- 
ntantile  function 


fQ(u)  = f(Q(u)) 


For  any  p in  0 < p < 1 and  u in  0 < u < 1 


Q(u)  - Q(p)  = J q(s)  ds  . 


Therefore  the  hypothesis  is  equivalent  to  the  hypothesis  hJ  defined  in 

terms  of  quantile-density  functions  or  density-quantile  functions: 


Hq  : q(u)  = Oq^(u) 


for  some  O > 0 


hJ  : fQ(u)  = i f^Q^Cu) 


for  some  a > 0 


Tlie  concepts  are  now  all  assembled  to  show  how  to  formulate  the  classi 


goodness  of  fit  problem  (testing  Hq  ) as  a density  estimation  problem. 
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Define 


5(.)  = .,(s,  . ^ 


ds 


defining 


1 ^ 


0 0 


± r"  ^oQo^^>  , 

T J £r\  / - \ d S 


0 0 


fQ(s) 


c = r‘  ^ 

“ ■‘n  fQ(s) 


ds 


Tlic  null  hypothesis  is  then  equivalent  to  = a , D(u)  = 

D(u)  = u . 

Natural  "raw"  estimators  are 


Ou 


defining 


~ L 

D(t)  = J > 

- j t 

■ “0  W*>  %<=>  • 


"0  ■ ■[  Vo<'> 


These  formulas  are  easily  computed  in  terms  of  the  order  statistics 


^1) 


^2)< 


< X 


(n) 


and 
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which  are  the  values  rearranged  in  increasing  size,  since  explicitly 

the  EQF  given  by 


Q,(u) 


= X 


(0) 


for  u = 0 


"(j) 


for  < u ^ ( i 

n n 


1 j . . . ,n) 


= X 


(n+1) 


for  1 < u < 


n + 1 


where  ~ ^ ^(n+1)  ~ ^ values  (which  could  be  -«  or  oo  ) 

representing  our  prior  judgment  of  the  lower  bound  and  upper  bound  of  the 

probability  distribution.  Note  Qjj(*^)  is  a piecewise  constant  function  with 

jumps  at  j/n  , j = 0,1,. ..,n  , of  size  X..^,.  - X..,  . 

(j+1)  (j) 


Spacings.  If  Xj^,...,X^  is  a random  sample  of  a continuous  random 

variable  X , with  order  statistics  denoted  X.,.  < X^.,.  < ,,,  < X its 

(11  (2)  (n)  ’ 

spacings  are  defined  by  [compare  Pyke  (1972)] 


q . = n(X, . . 

J.n  (j+1) 


1 - 0,1,. 


where  is  ® suitable  chosen  finite  number,  and  its  modified  spacings 

are  defined  by 


d. 

J.n 


j.n 


j = 0,1,. , . ,n - 1 


where  ® specified  density-quantile  function. 
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Non- parametric  raw  estimators  of  the  distribution  function  F(x)  and 

quantile  function  Q(u)  arc  respectively.  The  spacing 

q.  is  a difference  quotient  of  Q(u)  at  u = j/n  and,  therefore,  can  be 
J > 

regarded  as  a raw  estimator  of  q(u)  at  u = i/n  . However,  q.  is  not 

’ j,n 

by  itself  a consistent  estimator  of  ^{~)  ■ Consistent  (and  perhaps 

"efficient")  estimators  of  q(u)  can  ba  obtained  using  time  scries  theoretic 
methods.  More  importantly  our  methods  of  estimation  of  q(u)  , and  therefore 
fQCn)  , yield  not  only  their  values  at  individual  points  u , but  also  various 
functionals  (including  derivatives  and  integrals)  which  are  needed  for  adaptive 
and  robust  statistical  data  analysis. 


These  methods  extend  readily  to  censored  observations  and  subsets  of 
order  statistics;  therefore  they  have  applications  in  biometry  and  reliability 
theory. 

Our  approach  to  solving  the  basic  statistical  inference  questions 
given  a random  sample  Xj^,...,X^  can  now  be  summarized  as  follows: 

1.  To  non-parametically  estimate  the  unknown  probability  density 
function  f(x)  first  non-parametrically  estimate  the  unknown  density-quantile 
function  fQ(u)  through  estimating  the  ratio 


d(u) 


^qQqC”) 

fQ(u) 


where  fQQQ(u)  is  a specified  density-quantile  chosen  to  "guarantee"  that 
d(u)  have  various  integrability  properties  whose  necessity  will  arise  in 
the  course  of  our  theoretical  development. 
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2.  To  test  whether  a specified  ^0(0  is  the  true  probability  density 


(up  to  location  and  scale  parameters  n and  0 ) clioosc  the  corresponding 
density-quantile  function  the  function  to  be  used  in  forming 

from  spacings  raw  estimators  D(u)  and  cp(v)  and  tests  of  the  hypothesis 
that  the  density  d(u)  is  a constant  function. 

A A 

3.  To  form  efficient  estimators  and  O of  the  location  and  scale 
parameters  it  suffices  to  know  (or  to  have  estimated)  fQQQ(u)  and  Qq(u) 
since  then  one  treats  the  estimation  as  a problem  of  regression  on  a continu- 
ous parameter  time  scries  using  the  fact  Uiat , as  n ->  <»  , the  asymptotic 
distribution  [compare  Sliorack  (1972)]  of 


n fQ(u)[Qj^(u)  - Q(u)]  = ^ foQ0(u) {Q^(u)  - - O Qq(u)3 


is  the  Brownian  bridge  r>(u)  which  is  a normal  zero  mean  stochastic  process 

A 

with  covariance  kernel  E[B(s)  B(t)]  = min  (s,t)  - st  . Estimators  (J  and 

A 

a are  then  of  the  usual  regression  analysis  form  [compare  Parzen  (1961),  (1970)] 


= Inf, 


The  information  matrix  Inf„  is  defined  by 


< ^0^0  • ^0^0  > 


< ^0%  ’ ^ 


< ’ ^0%  ^ > %<w  > 
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in  terms  of  a (reproducing  kernel  Hilbert  space)  inner  product 


1 1 

f,F,  > = / f'(t)  g'(t)  dt  = -f  f"(t)  g(t)  dt 

0 0 


between  differentiable  functions  f(t)  and  f(t)  satisfying  f'(0)  g(0)  = 

^ (1)  P.(l)  “ 0 • The  statistics  T are_  linear  combinations  of  order  statis- 
tics found  as  follows: 


- -S  <)„(t)  r„Q(,(t)  dt  , 


1 

= -J  Q„(t) 


Explicitly, 
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“I®  . 

W^(u)  = JqCu)  + Qq(u)  wJ(u)  . 
llie  function  Jq(u)  is  defined  by 


and  is  called  the  score  function.  It  plays  a basic  role  in  the  theory  of  non 
parametric  estimation,  and  is  most  easily  estimated  using  the  fact  that  it  is 
the  derivative  of  the  density-quantile  function,  rather  than  the  formula 


Jo(^) 


A list  of  density-quantile 
univariate  continuous  probability 


functions  and 
laws  is  given 


score  functions  of  familiar 
in  Table  I, 
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2 . Tests  for  the  Equality  of  T\'/o  Distributions 

This  section  Introduces  a density  estimation  approach  to  non-parametric 
tests  of  the  hypothesis  that  two  independent  samples  (a  random  sample 

of  a continuous  random  variable  X , and  a random  sample 

1 m 

Y, , . . . ,Y  of  a continuous  random  variable  Y)  are  drawn  from  identical 
I n 

populations  in  the  sense  that  X and  Y are  identically  distributed;  in 
symbols , 

H-  : F (x)  = F (x)  for  all  x in  -on  < x < “ 

U A.  I 

One  way  to  define  a distribution  function  D(u)  , 0 < u s;  1 such  that 
Hq  is  equivalent  to  the  hypothesis  11^  : D(u)  = u,  Oi^u^l,isto  define 

I)(u)  = F^^Qy(u)^  or  D(u)  = 

Such  statistics  remain  to  be  investigated.  A statistic  which  corresponds  to 
currently  used  tests  of  11^  is  obtained  by  defining 

H(x)  = XF^(x)  + (1  - X)  FyCx) 

where  X is  the  limit  of  — 7 — , the  fraction  of  X values  in  the  combined 

m + n 

samples  of  X and  Y values.  In  words,  H(x)  is  a mixture  of  the  distri- 
butions of  X and  Y . 

Denote  by  F^  (x)  and  F (x)  the  EDF  of  the  X and  Y-  samples, 

A,m  Y,n 

and  let  *'^DF  of  the  combined  samples  of  X and  Y values. 
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where  N - m + n . Then,  defining  ~ ^ > 

V‘>  - <1- V • 

Since  = Fy(x)  = H(x)  under  , one  can  test  this  hypothesis 

by  testing  the  uniformity  of 

n(u)  = F^^if^u))  , 0 < u <;  1 


whose  natural  raw  estimator  is 


D(u) 


This  approach  can  be  readily  extended  to  testing  the  equality  of  k samples 
of  random  variables  , if  one  considers  for  j = 1 k , 


Dj  (u) 


where  H(x)  is  the  distribution  function  of  the  combined  sample. 


Now  H^^(u) 
in  the  interval  ^ 
fore,  for  j = 1,. 


is  a pi 

k-  1 k*! 

N ’ nJ 

, . ,in  - 1 


ecewise  constant  distribution  function  whose  value 
is  the  k-th  value  in  the  combined  sample.  There- 


m 


for 


R(X 


(jul 

N 


- 1 


1 


for 


s u < 1 
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whore  R(X^)  is  the  rank  of  as  a member  of  the  combined  sample 

^1’ ■ ■ ■ ’^m  ’ ^1  ’ ■ ■ ■ ’^n  ■ '^ords,  D(ii)  is  a piecewise  constant  distribution 

function  with  jumps  of  size  1/m  at  all  points  u of  the  form 

u = ■ n/N  , j = 1.2,. ..,m  . 

Tito  Fourier  transforms 

Cp(v)  = J dD(u) 

0 


have  natural  raw  estimators 


P(v)  . f dB(>.) 

0 


S exp  {2TTi\ 

j=l 


N 


Many  statistics  (denoted  , where  N - m + n is  the  total  sample 

size)  which  have  been  suggested  to  test  are  linear  combinations  of  the 

rank-order  statistics  R(X^)  . Chernoff  and  Savage  introduced  a representation 
for  linear  rank  statistics  in  a pioneering  paper  (1958): 

00 

-00  > 


where  Jfj(t)  is  a score  function  which  tends,  as  N , "suitably"  to  a 

limit  J(t)  . Tlie  foregoing  representation  of  may  be  written  (by 
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suitably  defining  ) 


0 


Let  us  show  how  test  statistics  of  this  form  arise  from  our  point  of 
To  test  the  hypothesis 


Hq  : d(u)  - 1 


against  a simple  alternative 


: d(u)  = d^(u) 


where  dj^(u)  is  a specified  function  one  can  show  that  an  asymptotic  likeli- 
hood ratio  test  statistic  is  the  "correlator" 


= J {d  (t)  - l}  dD(t) 
0 


Now  suppose  that  the  alternative  family  of  densities  is  denoted  d ("tl 

e ' 

to  indicate  that  it  is  parametrized  by  a parameter  6 ; suppose  we  have  the 
expansion 


dg(t)  = 1 + 0 6(t)  for  0 near  zero 


where 


■ 59  ■'e''), 


L 
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The  "correlator"  statistic  as  a likelihood  ratio  statistic  for  testing 

against  is  then  equivalent  to  the  linear  detector 

1 

R = J 6(t)  dn(t) 

0 

for  6 near  zero.  Uy  a direct  calculation  of  {>(u)  below  we  show  that  to 

test  H : F (x)  = F (x)  = F(x)  against  the  alternative 
U A Y 


= F(x),  F^Cx)  = F(x-  B) 


the  best  test  for  0 close  to  0 is  based  on  the  statistic  R.  where 

o 


6(u)  = -(1  - X)  J(u) 


where  J(u)  is  the  score  function 


J(u)  = fQ(u) 

Assume  that  the  density  f(x)  is  a symmetric  function  of  x ; then 

E.IRc]  = J 6(u)  d(u)  du  = e r 6 (u)  du 
® 0 ‘'o 

= 0 J (1-  X)^  j2(u)  du 

0 


so  that  an  approximately  unbiased  estimator  of  0 is 
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i<  n n 9 

e = I J(u)  dD(u)  (1  - X)  r J (u)  du 
0 0 

whose  variance  may  be  shown  to  be  approximately  equal  to 
Var  (0  ) = {X(l  - X)  r J^(u)  du]  ^ 

which  is  also  the  variance  of  the  maximum  likelihood  estimator. 

When  the  linear  detector  Rj.  is  used  to  test  whether  die  location 

0 

parameter  0 equals  0 , one  accepts  this  hypothesis  (for  large  sample 
sizes  N ) if 

1 _ , 

...2  r2  X{r  6(t)  dD(t)}^ 

le  I . 

NVacCsh  »Va.(R^) 

is  below  a suitable  tlireshold  (one  can  argue  that  the  threshold  is  a number 
of  the  form  C/N  where  C is  often  2 or  4 ) . I am  proposing  that  instead 

/s 

of  one  use  a non-parametric  estimator  ^ of  the  divergence 

A = f ('Jq(u)  - l]  log  d (u)  du  = 0 r 6 (t)  dt  ; 

''o  ® ® ‘*0 

_ * * 
if  0 were  estimated  by  0 , let  /\  be  denoted  by  A : 

* 1 „ 2 1 o 

A = 1 r 6(t)  dD(t)l  r 6 (t)  dt 
0 ‘o 

It  seems  plausible  that  the  proposed  "universal"  test  of  the  hypothesis  0=0, 

A 

which  accepts  it  when  A is  below  a suitable  threshold  of  the  form  of  C/N 
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r 

I 


I 


(for  a suitable  value  of  the  constant  C),  would  perform  as  well  as  the  best 
test  of  the  form  of  since  it  appears  to  be  asymptotically  calculating 

(up  to  a constant  multiplier)  the  same  statistic! 


lo  calculate  ^(u)  we  must  calculate  the  density  d (u)  corresponding 

0 

to  the.  canonical  distribution  function 


Dg(u)  = FHg^u) 


where 

Hg(x)  = XF^(x)  + (1  - X)  F^(x)  = Xf(x)  + (1  - X)  F(x-  0)  . 

To  cstablisli  a formula  for  dg(u)  we  obtain  from  tlie  defining  eejuation  for 
Dg(u)  that 

HgF‘^Dg(u)  = u . 

whence 

u = XDg(u)  + (1  - X)  F(QDg(u)  - 0) 

where  Q(u)  = F ^(u)  is  the  quantile  function.  Differentiating  with  respect 


1 


to  u 
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1 = Xdg(u)  + (1  - X)  f(QDg(t)  - e)q(Dg(u))  dg(u) 

wliencc 

{dg(u)]'^  = X + (1  - X)  f(Qnp(u)  - fl)  q(l)Q(u)) 
niCCorouti.at.iv\g  wLtli  resppct  to  0 ; 

-{dg(u)}  ^ dg(u)  = (1  - X)  f(QDg(u)  - 0)  q'^Dg(ii)^  ^ 

+ (1  - 'l(»e(^))  f'(QDg(u)  - 0)[q^l)g(u)^  ^ Dg(u)  - 1} 

Setting  0=0  and  using  the  identities  q(u)  f'^Q(u)^=  (fQ) '(u)  , fQ(u)  q'(u) 

+ (fQ)^(u)  q(ii)  = 0 , one  obtains  the  desired  conclusion:  6(u)  = -(1  - X)  J(u)  . 

To  test  a scale  parameter  0 , one  considers  alternative  hypotheses 
G(x)  = F(xO)  , where  0=1  represents  the  null  liypothcsis.  Using  the  fore- 
going argument  one  can  show  that 

6(u)  = -(1  - X)[Q(u)  fQ(n)}' 

= (1  - X)(Q(u)  J(u)  - 1] 

Asymptotic  variance  of  linear  rank  statistics.  In  terms  of  the 

canonical  distribution  function  D(u)  and  its  density  d(u)  , we  can  obtain 

2 

rather  simple  fomuilas  for  the  asymptotic  variance  o of  the  linear  rank 
statistics  of  the  form 

1 

\ J(u)  dD(u)  , 


which  satisfy  the  conditions  of  the  Chernof f-Savage  theorem  (1958); 
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J—  2 1 

vN  (\'M)  is  nsyTiiptotically  N(0,o  ) where  |i  = J J(t) 


2 ^ ^ 

^ ~ ^ J*  J dtc(t-s)  J^(s)  J^(t) 

0 0 


d[)(t) 


f \ 

(s  - XU(s)J^  - t - x(l  - D(t))J  d(s)  d(t) 
+ ^ D(s)(l  - D(t)j(l  - Xd(s)j(l  - Xd(t))} 


Under  the.  null  hypothesis  D(u)  = u 


H = f J(t)  dt 
■^0 


2 ^ 

0-2  — ^ J J ds  dt  e(t  - s)  J^(s)  j'(t)  s(l  - t) 

^ 0 0 


= J - J J(s)  ds]^  dt 


An  important  extension  of  these  results  is  to  J(u)  = 


; one 


obtains  tliat  under  the  null  hypothesis  of  independence,  {cp(v)  , v = ±1,  ±2  . } 

are  asymptotically  independent  n(o,^^  . 


1 


28 


Nou-Paramotric  Rt*f,ression 

When  the  data  consist  of  a random  sample  X, of  an  m-dimensional 

~n 

random  vector 


it  is  often  of  interest  to  test  the  hypothesis  : x\...,x''’  are  independent 

random  variables. 

Let  the  joint  distribution  function  and  probability  density  of  X be 

denoted  by  F (x  . ,x^)  and  f (x  . ,x^)  respectively.  Let  its  marginal 

distribution  functions  and  densities  be  denoted  by  ' 

1, 

Note  that  fj^(Xj^)  is  the  probability  density  of  the  k-th  component  X 
Corresponding  to  each  density  f,  (x,  ) there  is  a quantile  function  Q,  (u,  ) , 
and  a density  quantile  function  ) . 

K K K 

The  hypothesis  that  the  components  of  X are  independent  can  be 

expressed 

H : F(x. ) = F (x^)  . . . F (x  ) 
u L m i 1 m ra 

Equivalent ly , 

U 11  mm  1 m 
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Define 


"(“I’  -.V  " V''„>) 


it  is  the  joint  distribution  function  of  the  uniformly  distributed  random 


variables 


= F (xS,..,/'  = F (X^) 
i m 


We  shall  test  by  estimating  the  joint  density  function 


sir 

1 m 


nQi<V>--->W) 

" hQl("l>---fmQm(V 

Note  that  in  the  case  of  a multivariate  location  and  scale  parameter 
family  of  probability  densities 


'^l  • • • ^m 


1 n/^i  ~ t^i  X - |J  j 

1 ,0/  1 1 m m ' 

_ ^ I _ > • • ■ . _ 1 


each  marginal  density  is  of  the  form 


K'-k’  - 'ki 


and  the  individual  quantile  functions  are  of  the  form 
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Therefore 


e“(<)"(uj; 


■““■ ■"'  ■ 7^ 

nr  r 


. f%®(u  ) 
m ni  m 


Therefore  the  density  d(u  . . . u ) does  not  depend  on  location  and 

1 m 

scale  parameters  and  is  a measure  of  association  or  dependence.  In  particular 
d(Uj,..,u^)  is  identically  equal  to  1 if  and  only  if  all  components  of  X 
are  independent.  An  overall  measure  of  association  can  be  defined  by  the 


divergence 


,1  1 

^ i • • • J . ,n^)  - 1}  log  d(u^  , . . ,u^)  du^  • • • ‘’“n 


We  call;  d(uj^,..,u^)  the  regression-density  of  X ; A the  regression- 
density-divergence ; D(uj^,..,u^)  the  regress ion-dlstr i but  ion  function. 

For  the  bivariate  normal  distribution  with  correlation  coefficient  p , 
the  regression  density  is  given  by 


* ^ *"1  2 y 

d(s,t;p)  = (1  - p^)  exp  [{-2(1  - p^)}  {|p^  ’■(s)|  +|p^‘^(t)|  - 2p5'^(s)  §‘^(t)]] 


and  the  regression-density-divergence  is  given  by 


A 


1 
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For  a multivariate  normal  distribution  with  correlation  matrix  R (tiiat  is 
R is  the  matrix  of  correlation  coefficients  between  the  components  of  the 
random  vector)  the  regress i on-dens  i ty-d i vergence  is  given  by 


A = -2  tr(R'^  - 1) 


To  estimate  regression-distribution  D from  a sample  of  size  n we 
use  the  natural  raw  estimator 


D( 


A n\i,nj  tn,nni/ 


where  . ,x^^)  is  the  empirical  distribution  function  of  tlie  n random 

vectors  , and  sample  quantile,  function  of  the  k-th 

components  of  these  vectors. 


Define  the  Foiirier-Sticlt jes  transforms 


11  Zni (v, u, +.  . . +v  u ) „ 


1 1 2TTi(v.,u  +.  . ,4v  u ) 


1 1 2TTi(v,u,+.  . ,+v  u ) 

J ® d(u, ,...,u  ) du, ...du 


0 0 


m"  r 


It  will  be  shown  in  the  sequel  that  D tends  to  D and  cp  tends  to 
C(3  (as  n a>  ) therefore  we  can  form  (using  time  series  theoretic  statistical 
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methods)  estimators  d that  tend  to  d , wliicli  can  be  used  to  test  wlietlur 
d is  identically  1 (which  is  equivalent  to  the  null  hypotliesis  of  indepen- 
dence.). The  multivariate  case  discussed  in  this  section  seems  to  me  to 
demonstrate  the  power  of  reducing  statistical  problems  to  estimation  of 
densities.  If  tlie  only  statistic  one  works  with  is  the  distribution  function 
I)  (u,,...,u  ) one  is  confronted  with  the  difficult  task  of  testing  whether  it 

is  s i f-ni  f icantly  different  from  the  uniform  distribution  function  ii,iu...u 

12m 

Then  if  one  flunks  tliis  test,  and  rejects  the  assumption  of  independent  com- 
ponents of  the.  random  vector  X , one  has  no  means  of  modeling  tlie  dependence. 


Tile  empirical  regression-distribution  function  D(Uj^,  . . . ,u^^)  is  a 
purely  discrete  distribution  wliicii  assigns  mass  1/n  to  tlie  n points 


R (X")  - li 


vectors  (Xj,...,5^)  for  j = l,...,n  ; here  R^(X^)  denotes  the  rank  of 


which  are  the  rank  vectors  of  the  n random 


- t.r  ^ 

among  X^,...,X^  , 


Asymptotically  our  conclusions  are  unchanged  if  we  take  as  our  raw 

discrete  distributi 


estimator  n(uj,...,u^)  of  D(Uj^ , . . . the  purely  discrete  distribution 


function  which  assigns  mass  1/n  to  the  n points 


then  the  raw  estimator  of  cp(vj^ , . . . , v^)  is 


~ 1 “ 
cp(Vj^, . . . ,v^)  = ” s exp^2TTi 

j=l 


v,-i-^-b...  +v-5^  > 

\ 1 n m n / j 


In  the  two-dimensional  case  (m  = 2)  we  denote  the  observed  data  by 


(Xj^.Y^)  , . . . , (X^,y^)  . Then  the  n jump  points  of  D are  of  the  form 


Rj  is  the  rank  among  the  Y's  of  that  Y-value  corresponding 


where 
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L. 


to  the  X-value  with  rank  j . The  well-known  rank  tests  for  independence 

(see  Ilajek  (1969))  may  be  expressed  in  terms  of  the  vector  R R as 

1 n 

follows : 

n 

Spearman  test  S = £ jR. 

j=l  J 


Quadrant  test 


n 

S = E e(R.  - J^n-  1) 
jS'^n+l  J 


n 

Kendall  rank  correlation  coefficient  K = S E e(R.  - R.) 

i=l  j>i  ^ ^ 


Therefore  one  may  readily  establish  the  connection  between  our  time  series 
theoretic  approach  to  tests  for  independence  and  conventional  tests. 

To  test  the  hypothesis  of  independence  (regression  density  identically  1) 
one  may  be  willing  to  assume  a family  of  alternative  tiypothescs  indexed  by  a 
parameter  0 under  which  the  regression-density  may  be  represented 


^ + e6(u^,...,uj 


for  0 close  to  0 . Then  an  asymptotic  likelihood  ratio  statistic  for 
testing  independence  is 


R 


6 


11 

/ V ‘^»(Uj...,U^) 


In  the  case  m = 2 
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Tlie 

denoted  d 
that 


P 


regression-density  function 
(Uj^jU^)  , is  a function  of 


of  the  bivariate  normal  distribution 
the  correlation  coefficient  p such 


6(n, ,u  ) = - - d (u  ,u  ) 

1 ^ dp  p 1 2 


= 0 


Therefore  our  approach  yields 


as  an  optimum  non-parametu ic  statistic  for  testing  independence  against 

bivariate  normal  dependence.  llie  stati.stic  R is  the  Fislier- Yates  or 

5 

scores  statistic  well  studied  in  the  theory  of  non-parametr ic  statistics 
Spearman  and  quadrant  tests  are  linear  rank  statistics  corresponding  to 
weight  functions 


normal 
. The 
the 


Spearman  6(Uj^,U2)  = u^u^ 

Quadrant  6(u^,U2)  = ‘^^“2’2'n^  ‘ 

1 1 ~ 

K is  a linear  function  of  P f D(u,v)  dD(ii,v)  . 

0 0 

Tlie  concept  of  minimum  divergence  estimation  (defined  in  the  introduc- 
tion to  this  chapter)  can  be  illustrated  in  the  present  context.  To  estimate 
the  correlation  coefficient  p of  the  bivariate  normal  distribution,  the 
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lininuini  divergence  estimator  p is  the  value  p at  which 


~ J J ■ loS  d (u,,iu)  dD(u  u ) 
0 0 P ^ ^ ^ 


achieves  its  minimum.  By  solving  J (p)  = 0 one  may  show  tliat  p = R . 

6 

Kimeldorf  and  Sampson  (1975)  list  parametric  bivariate  regression- 
densities  corresponding  to  various  multivariate  distributions;  one  could 
estimate  their  parameters  using  J and  H divergence  functionals  of  I)(Uj^,u.,) 


Multivariate  Density  Estimation:  An  estimator  d(Uj^,U2)  leads  to  an 

estimator  of  fCxj^.x^)  , using  the  relation 


f(Ql(Ui).  Q2(^>2))  = d(u^,u2) 


Nonparametric  Regression:  An  outstanding  problem  of  statistics  is 

the  estimation  of  the  non-parametr ic  regression  of  on  in  the  sense 

of  the  conditional  mean 


EtX^lX^  = x^]  = J x^f  I (xjx^)  dx^ 

-CO  2 ’ 1 


f (x, ,x„) 


= f X -A1  2J 

J ^2  fiv  i 


2 f(x,)  “"'a 


By  making  the  change  of  variable  x^  = Q2(^2^  *^2  " ^2(^2^  ’ 
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r , f(x  ,Q„(U  ) 


this  fonnula  can  be  rewritten  to  yield  the  following  remarkable  theorem. 

Theoren:  (Regression-Density  Formula  for  Conditional  Expectation 

and  Non-Parametric  Regression) 


Ef-KzlXi  -Q(Uj^)]  - J Q^Cn^)  d(Uj^,U2)  ‘^'^2 


which  justifies  calling  d(uj^,vi2)  the  regression  density;  note  tliat 

e(Q](u  ),Q  (n  )) 

d(u,,u  ) = — VJ L £ — LJ- 


If  one  estimates  by  the  empirical  quantile  function  (u^) 

the  corresponding  estimated  conditional  expectation  is 


n 1/n 

^2'^1  ^ ^2  i J ‘^(“■,.“9)  du 

j=l  (j-l)/n  ^ ^ ^ 


where  Dj^Cuj^.u^)  is  an  estimator  of 


D^(Ui,U2)  = du'  = ^DCu^.u^) 


The  approach  to  non-parametric  multivariate  density  estimation  and 
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non-parametric  regression  outlined  above  (whose  theory  and  practice  remains  to 
be  investigated)  appears  to  show  that  one  can  estimate  regressions  without 
estimating  probability  laws! 


One  oftenprefers  to  calculate  regressions  as  conditional  quantile 
functions;  tlien  one  can  proceed  as  follows.  An  expression  for  the  conditional 
distribution  function  of  given  X^^  is 


f d(K  (x  ),u')  du^ 

.'o  112  2 


wliere  u^ 
X^  given 


= F (x„)  . It  follows 

X2  2 

Xj^  is  given  by 


that 


tlie  conditional  quantile  function  of 


Q^d'/(F^(Xi),p) 


In  words,  the  conditional  quantile  function  equals  the  unconditional  quantile 

function  with  a change  of  variable  u = D~^(F  (x  ),p) 

^ 1X1^ 

Wliilc  we  recommend  Fourier  theoretic  methods  of  estimating  D^(upU2) 

it  should  be  noted  that  a quick  and  dirty  estimator  can  be  provided  by  a 

"naive  k-nearcst  neighbor"  estimator 


- a '“<“1 


To  understand  the  dramatic  nature  of  our  approach  to  non-parametric 
regression  imagine  a scatter  diagram  of  points  ^ ~ l,2,...,n  in  the 

plane.  One  seeks  to  fit  a smooth  curve  y = g(x)  through  the  points.  A 
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typical  criterion  of  curve  fitting  might  be  to:  find  g to  minimize 

I S {Y  - g(X  )}^  + X J dx 

j=l  a 

where  g^'"^  is  the  m-th  derivative  of  g assumed  to  exist  over  some  specified 
interval  a to  b . The  solution  is  then  a polynomial  spline  of  degree 
2m  - 1 [see  Wahba  (1976)].  Rather  than  choose  a function  g by  such  an 
optimization  criterion  (which  is  inevitably  ad  hoc  and  still  requires  one  to 
specify  X,  m , a and  b ) we  are  proposing  that  one  adopt  as  one's  "optimal 
smooth  curve"  a curve  of  the  form 


y = g(x)  = r Qy(u)  d(r  (x),u)  du 
Jq  Y X 

where  Qy(u)  is  an  estimator  of  the  quantile  function  of  the  V-values, 

1 A 

/s 

is  an  estimator  of  the  distribution  function  of  X-values,  and  d(s,t)  is  the 
estimated  regression-density  function.  How  docs  one  explain  to  a numerical 
analyst  what  are  the  optimizing  properties  of  the  procedure  we  are  proposing? 


Multi-dimensional  non-parametric  regression:  The  foregoing  results 

can  be  extended  to  multi-dimensions.  We  state  only  a formula  for  the  conditional 

expectation  of  X on  X, ,...,X  , : 

^ m 1 ’ ’ m- 1 


E[X 


= 


..,X„ 


-1 


. ,u  ) 
m i m 


Vl^“l> 


) 

i 


du 

m 
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wliere 


j 

L>-->V  f Q (u  ) ...  f Q~T0 

ill  Di  m m 


is  tlio  rogrcssion-clensity  fund  ion  of  X,  ,..,X  (aiul  d , is  the  n-rres.sion- 

1 in  ni- 1 tj  ‘ ‘ 

density  function  of  X , ..,X  ). 

1 lil-  1 

1 1 

A.symptot  ic  d i .str  ibut  i on  of  statistics  of  the  fom  T = f f J(s)  K(t)  dD(s,t) 

0 

I'lie  work  of  Ruymgaart  (1974)  leads  us  to  the  following  roughly  stated 
limit  theorem: 


' — 2 

V n (T^  - (j)  is  asymptotically  N(0,o  ) 


where 

1 1 

H = J f J(s)  K(t)  dD(s.t) 
0 ■ 0 


= f r lv(s,t)l^  dl)(s,t) 
*0  0 


V(s,t)  = J(s)  K(t)  - r r J(u)  K(v)  dU(u,v) 

0 0 


1 1 

+ r J [e(w  - s)  - u]  J ^(u)  K(v)  dD(u,v) 

0 0 


1 1 

+ r f [c(v-t)  - v]  J(u)  k'(v)  dn(u,v) 
0 0 


Under  the  null  hypotliesis  D(s,t)  = st 


1 1 

H = J J(s)  ds  J K(t)  dt 


T 1 1 . 

= f r I V(s,t)  I ds  dt 
0 0 


1 1 

V(s,t)  = [J(s)  - J J(u)  du}[K(t)  - J K(v)  dv} 

0 0 

2TTiv  s 2niv  t 

Extending  these  results  to  J(s)  = e , K(t)  = e one  obtnius  that 

under  the  null  hypothesis  of  independence  = ±1,  ±2,...} 

arc  asymptotically  independent  N^O,— ^ . 


Joint  distribution  of  tlie  sample  quantile  functi ons  of  two  depe n d e n t 
I 2 

random  variables  X and  X . It  has  been  noted  in  Section  1 tliat  the 
modified  empirical  quantile  function  deviations 


Q (u)  = f Q (u)[Q  (u)  - Q (u)} 

J J J J J 

is  asymptotically  N(o,t(l-t))  ; further  Q.  (s)  and  Q.  (t)  have  asvTnptotic 

' / J,n 

covariance  s(l  - t)  when  s < t . Weiss  (1964)  proves  that  asymptotically 


Cov(Q.^n(^‘'>  . 


S ;S  t , 


defining 
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Using  this  result,  one  could  obtain  asymptotically  efficient  unbiased 
estimators,  from  i ncomple te  samples,  of  the  common  mean  n of  bivarinte 
normal  random  variables  X and  Y witli  unknown  unequal  variances  and  unknown 
covariance  (for  other  estimators,  see  Itamdan,  Piric,  and  Kluiri  (1970)). 

4 . Time  Series  Analysis  and  Autoregressive  Model  Approximation 

The  density  estimation  problem  (which  we  claim  is  a canonical  problem 
to  whicli  one  can  transform  many  basic  problems  of  statistical  inference)  first 
arose  in  the  analysis  of  stationary  time  series. 

bet  Y(t)  , t - 0,  il,i2,,..  he  a zero  moan  covariance  stationary 
normal  time  series;  its  probability  law  is  then  specified  by  cither  the 
covariance  function  K(v)  = E^Y(t)  Y(t  + v)^  , or  the  variance  R(0)  and  the 

correlation  function 

p(v)  = = Corr^Y(t)  ,Y(t  + v)'j 

The  covariance  function  has  a basic  mathematical  property  called 
positive  definiteness  and  defined  as  follows: 

^ * 

E c . c R(v . - V,  ) S’  0 
j,k=l  J J 

for  any  integer  n , complex  numbers  Cj^,...,c^  , and  indices  Vj^,...,v^  , 


L 
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Tliis  property  implies  that  R(v)  has  a spectral  representation  which  we 
write 

R(v)  = J d?(u)  , 

0 

where  F(u)  is  a non-decreasing  function  with  F(0)  = 0 . p(v)  has  a 
spectral  representation  whicli  we  write 

p(v)  = r c^^^^'^dF(u) 

0 


where  F(u)  is  a distribution  function  (a  non-decreasing  function  with 
F(0)  = 0 and  F(l)  = 1 ).  We  call  F(*)  the  spectral  distribution  function. 

In  developing  the  statistical  theory  of  stationary  time  series  analysis 
we  always  make  the  assumption  that  S lp(v)]  < «>  . Then  the  derivative 

V 

f(u)  = F (u)  exists  and  is  called  the  spectral  density  function;  in  terms  of 
f(*)  we  have  the  spectral  representations 


1 


p(v)  = r e^^’'^''f(u)  du  , V = 0,  ± 1 , ± 2, . . . , . 
0 


c/  \ - -2TTiuV  / X „ 

f(u)  = E e p(v)  , 0 5 u 1 . 

V=:  -CO 

Our  notation  should  be  noted;  we  use  t to  denote  "time,"  v to 
denote  "lag"  between  two  times,  and  u to  denote  "frequency"  when  its  domain 
is  0 S u <:  1 ; when  frequency  has  other  intervals  in  which  it  varies  it 
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is  customarily  denoted  by  letters  such  as  UJ  nnd  f and  the  intervals  are 
-n  < UU  < rr  and  -0.5  f S 0.5  . Note  that,  for  a real  valued  time  series, 
f(u)  = f(l  - u)  and  p(v)  = p(-v)  . 

llie  mathematical  existence  of  f(u)  is  deduced  from  the  fact  that 
p(v)  is  an  integrable  positive-definite  function;  the  interpretation  of  f(u) 
is  deduced  from  the  theory  of  linear  f i Iter s . 

To  transform  a stationary  time  series  Y(*)  to  a new  stationary  time 
series  ?.(•)  , one  generally  uses  linear  time-invariant  transformations 
(called  filters)  of  the  form 

00 

z(t)  = E h.Y(t-j) 

j=o  J 

We  like  to  introduce  an  operator  (call  it  K since  its  coefficients  liave 
been  denoted  b^  ) such  that  one  can  write  ?■{•)  - BY(*)  . Define  an  operator 
L (called  the  lag  operator  or  backward  shift  operator); 

Z(.)  = LY(-)  iff  Z(t)  = Y(t  - 1) 

2 

or  equivalently  LY(t)  = Y(t  - 1)  ; note  T.  Y(t)  = Y(t  - 2)  and  in  general 
L'\(t)  = Y(t  - n)  for  any  integer  n . Introduce  the  power  series 

00 

B(z)  = E b.zj  ; 
j-0  J 

Tlien  we  can  write  B = B(r,)  and  Z(t)  = li(I.)  Y(t)  . We  call  B(z)  the 
transfer  function  of  the  filter  B(I.)  . Regarded  as  a function  of  z = , 


2TTiii. 

we  call  l?(e  ) 


the  frequency  response  function.  The  notation  has  now 


AA 


been  introduced  to  answer  a basic  question  of  stationary  time  series  modeling,: 
What  are  the  properties  of  a time  series  Z(*)  whicli  arises  as  the  output  of 
a linear  filter  B(L)  whose  input  is  a stationary  time  series  Y(«)  . 

llicorem.  If  Z(t)  = b(h)  Y(t)  , where  Y(*)  is  a zero  mean  station- 
ary normal  time  scries  with  spectral  density  function  fy(u)  , 0 <:  u <:  1 , 

tlien  Z(-)  is  a zero  mean  stationary  normal  time  series  with  spectral  density 
function  > 0 S u 5 1 , given  by 

f^Cu)  = f.^(u)  . 

Since  many  questions  about  a stationary  time  scries  Y(»)  can  be 
readily  answered  in  terms  of  its  spectral  density  function  f(u)  , it  is 
natural  ttiat  the  estimation  of  f(u)  from  a finite  sample  {Y(t)  , t = 1,...,T} 
should  be  one  of  the  central  problems  of  the  theory  of  time  series  analysts. 

Natural  raw  estimators  p and  f are  obtained  as  follows: 
for  V = 0,1,...,T-1 

T T 

p(v)  = E Y(t)  YCt+v)^-  E Y (t) 
t=l  t=l 

while  p(v)  = 0 for  v ^ T and  p(-v)  = p(v)  ; one  may  show  that 

~/  \ 2tt1uv  ~ . , 

p(v)  = e f(u)  du  , 
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where 


f(u) 


^ - 2TTi uv  ",  , 
S e p(v) 

|v!<t 


S Y(t) 
t=l 


The  convergence  properties  (as  T ->  <»  ) of  these  estimators  are  as 
follows:  p(v)  ->  p(v)  but  f(u)-j4  f(u)  . Indeed,  f(u)  is  in  practice  a 
very  wiggly  function  which  behaves  like  white  noise  in  the  sense  that  f(Uj^) 
and  f(u2)  are  asymptotically  independent  for  any  fixed  Uj^  r u^  . The 
distribution  of  f(u)  is  esj-mptotically  exponential  with  mean  f(u)  . Tliis 
is  the  point  at  whicli  the  modern  era  of  time  series  analysis  started  (see, 
for  example,  Tukcy  (1959)):  how  to  pass  from  wiggly  estimators  f(u)  to 

A 

smooth  estimators  f(u)  wliich  are  consistent  (and,  if  possible,  asymptotically 
"efficient")  estimators  of  f(u)  . In  practice  one  might  use  and  compare 

A 

several  estimators  f(u)  formed  from  the  single  finite  sample  of  observations. 

Three  main  approaclies  have  developed  for  forming  smooth  estimators 
which  are  called  the  direct  approach,  the  indirect  approach,  and  the  auto- 
regressive approach. 

A 

Each  approach  considers  estimators  or  smoothers  f(u)  of  a different 


(i)  Direct  approach 


/s  1 

f(u)  = J K(u  - s)  f(s)  ds 
0 


form : 
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Cor  suitable  kernels  K . 

( i i ) Indirect  approach 

f(u)  - ^ c k(v)  p(v) 

V=  -CD 

for  suitable  weights  k , 

(iii)  Autoregressive  approach 

- ,-2  1 , , Zrriu  2'nimii|'^ 

I (u)  = a l+a,  e +,..+CXe 

m I 1 m ' 

for  a suitable  integer  m (called  the  order),  and  coefficients 

0 0.  a which  arc  estimated  from  the  sample, 

m 1 m 

The  extensive  literature  available  on  the  properties  of  these  mctliods 
of  estimating  f(u)  enables  us  to  claim  that  we  have  successfully  shown  how 
to  transform  diverse  statistical  problems  to  a problem  (density  estimation) 
which  has  been  "successfully"  solved. 

However,  I would  like  to  add  a further  claim;  one  can  develop  the  auto- 
regressive method  so  that  it  provides  a "most  successful"  or  "optimum"  solution 
of  the  density  estimation  problem. 

The  name  autoregressive  approach  comes  from  the  notion  of  an  auto- 
regressive scheme.  One  can  show  that  the  true  spectral  density  f(u)  is  of 


f(u)  = 

m 


, 2TTiu, 


-2 


the  form 
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wliere 

g (z)  = 1 + a z + ...  + a z'" 

^ -L  m 

iff  p(v)  satisfies  the  difference  equation 

p(  v)  + p(l  - v)  + ...  + CC^  p(m  - v)  = 0 , V > 0 


iff  Y(t)  satisfies  the  stochastic  difference  equation 
Y(t)  +ajY(t-l)  + ...  +a^Y(t-in)  = e(t) 
where  the  process  e(t)  obeys  the  conditions 
E|e(t)l^  = 

' ‘ m 

Pg(v)  = E^e(t)  e(t+v)^  =0  for  V / 0 

E^Y(s)  e(t)^  = 0 for  s < t 

A time  series  is  called  white  noise  iff  its  correlation  function 
p(v)  = 0 for  V 7^  0 . 

Modeling  a time  series  by  an  autoregressive  scheme  is  convenient 


because  one  can  then:  (i)  readily  estimate  the  parameters  of  tVt>  model, 
and  (ii)  solve  the  prediction  problem:  given  ttie  values  Y( t ) , Y( t - 1 ) , . . . 
to  predict  Y(t+l),Y(t+2) 

To  a general  spectr.nl  density  f (u)  satislying  the  conditions  that 
log  f(u)  and  f (u)  are  intcgrablc  we  can  associate  a sequence  of  auto- 
regressive approximators  f (u)  , m = 0,1,...  First,  f„(u)  = 1 : to 

m ’ ’0 

define  f (u)  for  m > 0 introduce  the  minimization  problem:  let 
m * 

CX,  , . . . ,a  be  the  values  at  which 
1 ,m  m,m 


J 

m 


+ a., 


2niu 

e 


+ . . . 


a e 
m 


2TTiumi 


f(u)  du 


achieves  its  minimum  value,  and  let  0 denote  the  minimum  value  so  that 

m 


m 


A 


m ,m 


Define 


1 + a,  z + 
1 ,m 


. + a z 

m,m 


Tlie  coefficients  in  g (z)  can  be  determined  from  tlie  normal 

m 


equations 


f , 2TTiu.  -2TTiuv^,  V , „ 

J ^ ^ du  = 0 , v = l,...,r 


which  is  equivalent  to 
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and 


p(-v)  + a,  „ p(l  - v)  + ...  + a p(ni  - v)  ==  0 
1 ,in  m ,m 


= f(u)  du 


= p(0)  + aj^_^,p(l)  + ...  + 

Conditions  for  the  convergence  of  fj,|(iO  to  f(u)  arc  stated  in  Ccronimus 
(1960);  in  addition  to  log,  f(u)  and  f (u)  are  both  integrahlc  we  most 
assume  a certain  sequence  of  partial  correlation  coefficients  is  absolutely 
summablc.  One  can  tiicn  show  that  one  can  represent 

c/  \ _ I /■  2TTiu.  I 


g (z)  = 1+a  z + ...+a  z”'  + . .. 

“ 1 ,00  m ,00 


Estimators  f^(u)  of  f(u)  arc  easily  obtained  as  follows.  Let 

A A 

be  the  solutions  of  the  sample  normal  equations 


p(-v)  + p(l  - v)  + . . . + o(m  - v)  - 0 , 


V = 1 , . . . ,m 
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I)e  f i nc 


1 + z + 


+ a z 
ni 


Om  " -'  aj^pCD  + •..  +a^p(m) 


: , , _ "2  i"  , 2TTiu-  ,-2 
f (u)  - O p.  (c  ) 
ni  m I ^ m ' 


Tlic  functions  f^Cu) , . . . , £^,  j^(u)  can  be  regarded  as  a sequence  of 
functions  which  proceed  from  the  smoothest  constant  function  “1 

the  wiggliest  function  f,^  j^(u)  = f(u)  . One  desires  to  find  an  intermediate 

^ A 

value  of  m , denoted  m , such  that  f/v(u)  can  be  regarded  as  not  the 

m 

smoothest  estimator  of  f(u)  but  as  the  "most  likely"  estimator  of  f(u)  . 

For  tliis  purpose  one  needs  a criterion  to  determine  m (called  an  order- 
detenninat ion  criterion).  Such  criteria  have  been  developed  by  a nvimber  of 
authors  using  various  conceptual  fr;imeworks.  The  approach,  of  Akaike  (1974) 
is  particularly  well  known.  Tlic  discussion  of  this  question  requires  an 
extensive  paper  Iiy  itself.  Space  permits  me  only  to  introduce  my  o\<n\  criterion, 
which  1 call  CAT  (criterion  autoregressive  transfer  function). 

A 

Ratlier  than  directly  examining  the  properties  of  estimator 

A 

of  f{u)  , I focus  on  the  properties  of  g (z)  as  an  estimator  of  g (z)  . 

m ® 

A 

Wc  would  like  to  choose  g (z)  to  minimize  the  overall  mean  square  error 
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J = J f(u)  du 

0 


2 „ ®a)  ■ ^co 

= ^o,  J f-  — ;; — du  . 

0 ^CO 


Now  overall  mean  square  error  can  be  expressed  as  a sum  of  overall  vari.anct 


V = r Var  [g  ] f(u)  du 
^0 


and  overall  squared  bias 


~ / l^-So.  ■ Sool  du 

0 


It  can  be  shown  (see  Parzen  (1976))  that  the  degree  m polynomial  best 
approximating  g is  g multiplied  by  a suitable  constant.  Therefore  we 

^ Ul 

■'>  A A.  ^ 

restrict  ourselves  to  estimators  g of  tlie  form  g = g/s  where  m mini- 

m 

mizes  the  function  of  m 


A 2 


J(m)  = J E|g  - g^l  f(u)  du 


= J Var  (g^)  f(u)  du  + J ]g^  - g^l  f(u)  du 


We  are  able  to  obtain  a remarkable  approximate  evaluation  of  J(m)  by 


changing  our  definitions.  Define 
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Y^(2) 


1~  ' 


Y (z) 
m 


and  change  the  definition  of  J(in)  to 


A 1 2 

J(m)  = Var  (Y  ) f(ii)  du  i f |y  - Y | f(u)  du 
ni  ' ca  m I 


One  can  show  that  the  second  term  (representing  the  overall,  bias)  equals 
-2  -2 

O - a , and  that 
» m 


J (m) 


Ihis  remarkable  formula  motivates  the  following  order  determination 

A. 

criterion:  given  a sample  of  size  T , choose  m to  minimize  the  function 
CA'f(m)  calculated  from  the  sample  as  follows: 


CAT(O)  = -(l  +ij 


1 ” ^ 9 « 0 

CAT(m)  = ^ So.  - O 

^ .-i  1 '» 

j=l 


where  o^  is  an 
m 


2 

"unbiased”  estimator  of  n defined  by 

m 


a 


V ~ r) 


When  m - 0 , we  estimate  f(u)  to  be  the  constant  I , and  accept 
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ttio  hypothesis  that  the  sample  could  have  been  drawn  from  a white  noise 
process . 


An  order  detern\i nat i on  procedure  can  be  regarded  as  a procedure  for 
adapt ively  determining  from  tlie  sample  a "most  powerful"  test  statistic  for 
the  null  hypothesis  of  white  noise.  The  meaning  of  this  assertion  requires 
nn>.ither  paper  f.o  discuss. 

II  may  bo  helpful  to  make  some  intuitive  remarks  about  order-determining 

crilt'ria.  Tho  residual  variances  o decrease  ns  m increases  so  that  they 

m ^ 

do  not  decisively  indicate  which  order  m is  long  enough.  Tiie  minimum  of 
the  "unbiased  residual  variances"  0^^  usually  exists;  while  empirically  it 
may  on  occassion  choose  the  "right"  order  there  is  no  conceptual  basis  for 
its  use.  Akaike's  criterion,  to  minimize 

AlC(m)  = log  + 2 ^ , 


can  be  justified  using  an  entropy  maximization  inference  criterion.  In  recent 
work,  Wabba  uses  cross-validation  inference  criteria  to  determine  smoothing 
factors;;  her  work  can  be;  directly  applied  to  density  estimation. 

I would  like  to  suggest  a new  criterion,  motivated  by  the  cross-validation 
criteria  of  Wahba  and  whicl\  1 call  CV  , whose  order-determining  properties 
need  to  he  examined; 


CV(m) 
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To  estimate  a density  function  d(Uj^ , , . ,u^^)  wliicli  is  a function  of 
several  variables,  one  cannot  use  autoregressive  methods  of  approximation; 
however,  one  can  develop  indirect  methods  of  estimation  where  the  weights 
are  chosen  using  cross-validation  criteria.  I believe  we  are  justified  in 
claiming  that  one  can  empirically  estimate  density  functions  witli  almost  no 
prior  assumptions. 


5 . Reliability  Tlieory 

Let  be  the  density  quantile  function  of  tlie  exponential  dis- 

tribution fpCx)  = e'’^  ; then  Qq(u)  = -log(l-u)  and  toQ^Cu)  = 1-u  . Let 
f(x)  be  the  probability  density  of  a non-negative  random  variable  X ; tlien 

^ 1 1 ^ 

I IqUi)'  " jT  q('')d.i=  ^ {1  - F(x)]  dx  = n = E[X]  . 

Thus  i ntegrabi 1 i ty  of  (1  - u)  q(u)  is  equivalent  to  the  mean  being  finite. 
The  integrand  (1-u)  q(u)  occurs  frequently  as  it  is  related  to  tlie  hazard 
function 


h(x) 


f(x) 

1 - F(x) 


and  the  hazard  quantile  function 


hQ(u) 


1 

(1-u)  q(u) 


r 
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Next  define  the  distribution  function 


^es^^^  = f ^ fi-F(y)}  ‘Jy 
0 ^ 


wliLch  is  denoted  as  it  is  the  distribution  function  of  the  residual 

lifetime  in  a renewal  process.  The  distribution  function  on  0 u S 1 


D(u)  = F f‘\u) 
res  ' 


1 Q(u) 

= u J - ^(y)}  fly 


0 


1 -1 

= J (1  - t)  q(t)  dt  { r (1  - t)  q(t)  dt] 

0 0 


has  density  d(u)  = ^ following  hypotheses  are  equivalent; 


D(u)  = u , 
d(u)  = I , 

Fr,3(x)  = F(x) 

F(x)  is  the  exponential  distribution 

In  other  words,  a test  for  exponentially  is  providing  by  testing  whether  the 
density  function  d(u)  is  constant. 

A raw  estimator  of  D(u)  is  provided  by 


L 


This  function  Iins  been  extensively  studied  by  researchers  in  reliability  theory 
(especially  Barlow  and  Van  Zwet  (1970),  Barlow  and  Proschan  (1976))  under  the 
name  of  the  total  time  on  test  statistic. 

The  statistic  D(u)  , 0 u ^ 1 , also  can  be  deduced  from  the  general 

one-sample  theory  outlined  in  Section  1;  however,  the  derivation  is  different 
in  this  section  since  it  is  directly  motivated  by  a search  for  a test  of 
exponential ity . Section  1 provides  tests  for  any  specified  distribution  (more 
precisely,  a specified  density-quantile  or  ^qQq  function).  A list  of  density- 
quantile  functions  of  familiar  univariate  distributions  is  given  in  Table  I. 


Table  I Density-Quantile  Functions 


Name  of 

Density 

Quantile 

Probability  Law 

f(x) 

Q(u) 

Normal 

cp(x)  = ^^x) 

I'^u) 

, 1 2 

1 -%  X 

-1 

Log-normal 

^ Cp(log  x) 

(u) 

Density-Quantile 

fQ(u) 


-~Z  exp  1^  ^(u)  1 

V 2n 


.-1,  , -$‘^(u) 

Cp9  (u)  e 


Exponential 


-X 


e , X > 0 -log  (1  - u) 


1 - u 


Pareto  P > 0 


IHl/?) 

X > 1 


-1 


, (l-u)-P 


Extreme  Value 


X -e 
e e 


log  log  (1  - u) 


-1 


(1  - u)  log 


1 - u 


Weibull 
c = 1/p  > 0 


Cauchy 


c-1  -X 
cx  e 

X > 0 

1 1 

n - , 2 

1 +x 


tan  tt(u  - - ) 


^(1-u) 


1 • 2 „ 

— sin  TT  u 
TT 


Logistic 


(1  +e’') 


log 


1 - u 


u(l  - u) 


1 - Ixl 

Double-exponential  — e ' 1 


log  2u  , u < - 
-log  2(1  - u)  , u > I 


u . < 2 

1 1 
1 - u , u > - 


Uniform- reciprocal  , x > 1 

X 


1 - u 


(1  - Vi) 


Store  functions  J(u) 


Normal 
Exponential 
Double -Exponential 
Logistic 


1 

sign  (2u  - 1) 
2 u - 1 


Cauchy 


sin  2'nu 
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